MIT CSAIL Technical Report No 975, November 2004 


On Decision Procedures for Set-Valued Fields 


Viktor Kuncak! Martin Rinard 2 


MIT Computer Science and Artificial Intelligence Laboratory 
Cambridge, Massachusetts, USA 


Abstract 


An important feature of object-oriented programming languages is the ability to 
dynamically instantiate user-defined container data structures such as lists, trees, 
and hash tables. Programs implement such data structures using references to 
dynamically allocated objects, which allows data structures to store unbounded 
numbers of objects, but makes reasoning about programs more difficult. Reasoning 
about object-oriented programs with complex data structures is simplified if data 
structure operations are specified in terms of abstract sets of objects associated 
with each data structure. For example, an insertion into a data structure in this 
approach becomes simply an insertion into a dynamically changing set-valued field 
of an object, as opposed to a manipulation of a dynamically linked structure linked 
to the object. 


In this paper we explore reasoning techniques for programs that manipulate data 
structures specified using set-valued abstract fields associated with container ob- 
jects. We compare the expressive power and the complexity of specification lan- 
guages based on 1) decidable prefix vocabulary classes of first-order logic, 2) two- 
variable logic with counting, and 3) Nelson-Oppen combinations of multisorted 
theories. Such specification logics can be used for verification of object-oriented 
programs with supplied invariants. Moreover, by selecting an appropriate subset 
of properties expressible in such logic, the decision procedures for these logics yield 
automated computation of lattice operations in abstract interpretation domain, as 
well as automated computation of abstract program semantics. 


1 Introduction 


Analysis and verification of modern object-oriented programming languages 
poses unique challenges [50,34, 44,20]. In this paper we study a feature that 
we consider essential for object-oriented programming: the ability to intro- 
duce user-defined abstract data types, and create an unbounded number of 
instances of these data types during program execution. Particular difficulties 
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arise when each data type instance is itself implemented using multiple dy- 
namically allocated objects that form a linked data structure. Our approach 
for analyzing such programs is to use abstract set-valued fields as specification 
variables that describe operations of an abstract data type, and separate the 
analysis of the program into verifying the correctness of the implementation of 
the abstract data type with respect to the set specification and verifying the 
correctness of the rest of the program where linked data structure is replaced 
by abstract set-valued fields. We next give some more context of our work. 


Global abstract data types. An important feature of modern programming 
languages is the ability to introduce user-defined abstract data types; such 
data types allow the developers to build applications on top of concepts that 
are most appropriate for the applications, as opposed to relying only on the 
concepts built into the language. Modules have been used successfully as 
a language mechanism for implementing abstract data types [33, 54, 39, 36], 
and are an effective way of specifying abstract data types if there is only one 
instance of the abstract data type in the program, or if the instances are 
implemented without using linked data structures. 


Linked data structures. Containers that store objects form a large class 
of user-defined data types. Such containers are often implemented as linked 
data structures (such as lists, trees, and hash tables) that use references to 
cells dynamically allocated on the heap. Reasoning about programs contain- 
ing linked data structures is difficult because there is no compile-time bound 
on the size and the complexity of the linking structure that can be created. 
Sophisticated shape analyses have been developed to statically analyze sets of 
possible linking structures created by programs [37, 23, 15, 47,16, 11]. Shape 
analyses are generally effective with analyzing individual data structures, but 
often have difficulties scaling to larger programs. 


Hob project. One of the main design principles behind the Hob project 
[30, 31,58, 29] is that reasoning about programs with complex data structures 
becomes simpler if data structure operations are specified in terms of abstract 
sets of objects associated with each data structure. For example, an inser- 
tion into a data structure in this approach becomes simply an insertion into a 
dynamically changing sets of objects, as opposed to a manipulation of a dy- 
namically linked data structure. Hob splits the verification of programs with 
such data structures into two tasks: 1) using shape analysis to verify that 
data structure implementation conforms to the specification given in terms 
of the abstract set variables, and 2) using only the abstract set variables in 
the rest of the program to reason about the behavior of the data structure. 
The use of different analysis techniques is possible because Hob architecture 
supports the combination of heterogeneous analysis plugins while analyzing a 
single program. So far, we have used Hob to verify implementations of global 
data structures, which are instantiated at compile time into a finite number 
of instances. The focus on global data structures allowed us to use static 
module mechanism to encapsulate fields of objects and prevent representation 
exposure, as well as to use the decidable theory of Boolean algebras [24, 48] 
to reason about the finite number of abstract sets that specify data struc- 
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tures. Our goal is to make Hob applicable to dynamically instantiated data 
structures as well. 


Dynamic instantiation of linked data structures. Dynamic instantiation 
of abstract data types is one of the key features of object-oriented program- 
ming languages. Dynamic instantiation is typically achieved by associating 
abstract data type instance with an object, and using a field to attach the un- 
derlying linked data structure to the object. We are currently extending Hob 
to verify programs that use linked data structures that can be dynamically 
instantiated. In this approach, we specify a linked data structure attached to 
an object using a finite number of set-valued fields of an object. The result of 
abstracting the content of data structures in a program using this technique 
is a program that manipulates objects connected using relations. A relation 
in the resulting program can be either a function (whose value for a given 
object is the object referenced by an object-valued field), or a general relation 
(whose value for a given object is the set of objects stored in the data structure 
associated with the object). 

The generalization to dynamic instantiation of data structures in Hob re- 
quires extensions to both phases of verification: 1) verification that linked 
data structure conforms to the set interface given by values of object fields 
and 2) verification of the resulting program that uses objects with set-valued 
fields. To address the first problem, we are extending the existing technique 
in Hob with the techniques for specifying representations of individual ob- 
jects [42,8,6, 2,3]; these extensions are necessary to ensure that the analysis 
of one instance remains valid in the presence of other instances in the heap. 

The topic of this paper is the second problem: verification of programs that 
manipulate objects with set-valued fields. Like [45], we are concerned with 
verification of clients of abstract data types, but we focus on specifications ex- 
pressed in terms of set-valued fields and derive a complete decision procedure 
for the constraints in our class. Our approach uses assume/guarantee reason- 
ing with user-supplied annotations to completely separate the analysis of the 
implementation of the class from the analysis of the context; other approaches 
attempt to automatically infer both the approximation the context and the 
approximation of class implementation [34], potentially using a global fixpoint 
analysis. 


Decision procedures for set-valued fields. To study the automation of 
reasoning about programs with set-valued fields, we explore decision proce- 
dures for constraints on such fields. Our constraints can express relationships 
between sets associated with the same object, the aliasing between object ref- 
erences, as well as the relationships between sets associated with different ob- 
jects. By annotating programs with such constraints and using a verification- 
condition generator [58], developers can verify a range of invariants of object- 
oriented programs. Moreover, by selecting an appropriate subset of properties 
expressible using such constraints, a decision procedure for these constraints 
yields automated computation of lattice operations in abstract interpretation 
domain, as well as automated computation of abstract program semantics 
(transfer functions) for the analysis [9]. 
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assume x # null A x € alloc; 

oldxc := 2.c; 

new y; 

while [x A null Ay A nullAz AyAz.cUy.c = oldxc] 


(x.c £ 0) 
f 


e := removeFirst(x); 


// process(e); 
insert(y, e); 


assert y.c = oldxc; 


Fig. 1. An example program fragment that manipulates set-valued fields. Here z.c 
denotes the value of the set associated with object denoted by z. 


e := removeFirst(z) : 
havoc e; 
assume € € %.C; 
go = ge \ te} 
insert(y, e) : 
ior y.eU Le} 


Fig. 2. Specifications of procedure calls from Figure 1 


Contributions and overview. ‘To motivate the constraints studied in this 
paper, we present an example in Section 2. We present our formal setup in 
Section 3. As the main result of this paper, we explore reasoning techniques 
for programs that use set-valued abstract fields by comparing the expressive 
power and the complexity of specification languages based on decidable prefix 
classes of first-order logic (Section 5), two-variable logic with counting (Sec- 
tion 6), and Nelson-Oppen combinations of multisorted theories (Section 7). 
We observe that both the decidable prefix class [3*V*|_ and Nelson-Oppen 
combination yield optimal NP algorithms for deciding an interesting class of 
constraints. On the other hand, the use of two-variable logic with counting 
allows more expressive constraints (such as the constraint that a field is never 
null), but requires an NEXPTIME decision procedure in general. We present 
our preliminary conclusions in Section 8 and discuss related work throughout 
the paper. 


2 Example 


Figure 1 presents an example program fragment containing a precondition 
(expressed using an assume statement), a loop invariant (expressed using [. . .] 
brackets just before the condition of the while loop), and a postcondition 
(expressed using an assert statement). The program fragment empties the set 
x.c and copies its content into the set y.c (one could imagine some processing 
of primitive fields of e being performed in each loop iteration, but this is of 
no relevance to our example). The property that we wish to verify is that 
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loop invariant initially holds: 

xA#null A x € alloc > 
y ¢alloc A yAnull A yc=0 => 

2ey A eAnull A y Anull A welyc= ac 

loop invariant is preserved: 

gey Awe -~ null Ay F null. A s.eUye= oldxe => 
e€evcae-y A (mc \ {eh} Uwycufe}) —coldxe 

loop invariant implies postcondition: 

sA#y AxH#null A yAnull A zcUyc=oldxec A zc=9 > 
y.c = oldxe 


Fig. 3. Verification conditions for Figure 1 


<— Vo | null | O.fo 

Sto Vg | S1U So | $19 S_ | S1\ S| {O1,...,On} | O-fs 
== Vy | folO1 > On] | fs|O > 5] 

n=O, = O02|O€ S| S, = S.| card(S) <k | fi = fo 
n= A/F AF)|PiV I) | AF 


WYRwHO 


Fig. 4. Syntax of expressions and formulas 


the content of the set y.c at the end of the program fragment is equal to 
the original content of the set 2.c, which is stored in the auxiliary set-valued 
local variable oldxc. The property is true, because procedure call removeFirst 
removes an element from x.c and returns it in e, and then insert inserts the 
same element into y.c. Figure 2 shows guarded-command specifications of 
procedure calls that we use to reason about the effects of procedures; our 
system verifies separately that procedures conform to their specifications. 

Given the precondition, loop invariant and the postcondition for the pro- 
gram fragment in Figure 1, we can generate verification conditions that imply 
that the program postconditions will hold. Figure 3 shows these verifica- 
tion conditions for the program fragment. Note that the resulting constraints 
require not only reasoning about the content of individual sets (as in the se- 
mantics of insert), but also reasoning about aliasing of references to objects 
(as in the conjunct « # y) and reasoning about the relations between sets 
associated with distinct objects (as in the conjunct x.c U y.c = oldxc). 

In Section 3 we define a class of constraints on objects with set-valued 
fields, and introduce a guarded command language whose verification condi- 
tions belong to this class. In the rest of this paper we study the validity and 
the satisfiability problem for constraints in this class. 


3 Specification Language 


We next introduce a specification language for expressing constraints on ob- 
jects with set-valued fields. The syntax of this language is in Figure 4. Our 
specification language is typed (multisorted); we are only concerned with well- 
typed formulas. The nonterminal O denotes objects, which can be potentially 
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weet 2.P) = Pile 2b) A WDE) 
wp(havoc x, P) = Vz. o 
wp(assert 2, P) = QA 
wp(assume Q, P) = Q as 
wp(s1 [] $2, P) = wp(si, P) A wp(52, P) 
wp(s1 3 52, P) = wp(s1, wp(S2, P)) 


Fig. 5. Weakest preconditions of guarded commands 


wf a f:=flar E| 
new y havoc y; 
assume y ¢ allocA y 4 null A y.c = 9; 
alloc := alloc U {y} 


Fig. 6. Desugaring of some commands 


WD(x.f) =x null A WD(z) A WD(f) 
WD(flz > E]) = 2 # null A WD(f) A WD(z) A WD(E) 
WD(S; US) = WD(S;) A WD(S2) 
)= 


Fig. 7. Key clauses in well-definedness of expressions 


null, S denotes sets of non-null objects, and f denotes fields. Fields can map 
objects to objects (then they are denoted fo) or they can map objects to sets 
(then they are denoted fs). We use formulas (the non-terminal F’) as part of 
assume and assert statements, the conditions of while loops, and if statements 
(which can be represented using assume and |]). We use the object-valued and 
set-valued terms of this language (the non-terminals O and S in Figure 4) on 
the right-hand side of the assignment statements. 

The meaning of constructs in this language is straightforward. Notation 
x.f denotes a dereference of a field f of objects x, which can be thought of 
as a function application that signals an error if the object x is null. Notation 
folo1 2 02] denotes an update of an object-valued field fo so that 0;.fo = 02 
and the value of the same field for all other objects is the same; such update 
operation corresponds array update if we view the field f as an array of objects 
indexed by objects. Set operations in our language have standard meaning. In 
the expression card(S) < k, notation card(S) denotes the number of elements 
(cardinality) of the set S, and k denotes a non-negative integer constant. For 
complexity considerations, note that we represent integer constants in unary 
notation, so a constant k has the length k as opposed to log k. 

This paper considers decision procedures for the validity of formulas whose 
syntax is given by non-terminal F’ in Figure 4. The validity of such formulas 
can be used to show the validity of verification conditions in a programming 
language. To illustrate this claim, we sketch a weakest-precondition seman- 
tics of a guarded-command language in Figure 5. The language contains no 
procedure calls or loops; these constructs can be transformed into loop-free 
guarded commands using supplied loop invariants, procedure preconditions 
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and procedure postconditions (see, for example, [58,14,32]). Figure 6 presents 
the desugaring of field assignment as well as the desugaring of new statement 
using a global variable alloc denoting the set of currently allocated objects 
(the desugaring of new assumes that c is the only field of y). Figure 7 shows 
some key clauses for computing the well-definedness condition of an expres- 
sion; this condition ensures there are no null dereferences while computing 
the expression. We present the key cases that check for null on field update 
and field dereference; the remaining cases simply take the conjunction of the 
conditions for constituent subexpressions, as illustrated in Figure 7 for the 
case of U operation. 


4 Preliminary Observations 


We first make several observations on deciding the validity of formulas in the 
language of Figure 4. 


Boolean closure and satisfiability. First note that the language is closed 
under all boolean operations, which is a desirable property for program spec- 
ifications [26]. An important consequence of the boolean closure is that the 
validity problem for our constraints reduces to the satisfiability problem. In 
the sequel, we will therefore only consider the satisfiability problem. 


Propositional structure of constraints. Note further that our constraints 
are quantifier-free. By a transformation into disjunctive normal form, the sat- 
isfiability of constraints reduces to satisfiability of conjunctions of literals A 
and =A where A is given by Figure 4. Algorithmically it is better to avoid the 
transformation into disjunctive normal form, and view the satisfiability algo- 
rithm as a non-deterministic procedure that selects a satisfying assignment to 
atoms of the quantifier-free formula, and checks that the satisfying assignment 
corresponds to a satisfiable conjunction of literals [13]. In any case, we reduce 
satisfiability of constraints to satisfiability of conjunctions of literals. 


Translation to unnested form. Note finally that we can transform every 
conjunction of literals into an equisatisfiable unnested form which contains 
no nested terms. We transform a formula into unnested form by introducing 
fresh variables; these fresh variables become existentially quantified, because 
we are looking at satisfiability. In the resulting unnested form, each atomic 
formula is of one of the following syntactic forms: Vg = Vé.fo, Vs = Vg UV, 
Vg = VEN V2, Ve = Vd \ Ve, Vg = (V6, -+-> VO}, Ve = Vo-So, Vi = V7? [V6 b> 
V8], V} = V?[Vo + Vs], VA = V2, Vo € Vs, Vd = V2, card(Vs) < k, Vi = V?. 

In the sequel we outline decidability of conjunctions of such unnested for- 
mulas and their negations. We consider three different methods. We pay 
most attention to the first method (Section 5). This method is based on a 
previously well-studied class of formulas; what we found interesting is that 
this class is applicable to such an expressive constraint language, and that it 
yields the optimal complexity bound for this class, namely NP. It is interesting 
to mention the use of Nelson-Oppen combination of theories because it shows 
that our problem can be naturally decomposed into individual problems each 
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of which can be solved using previously identified theories. Our result also 
implies that each of these individual theories can be showed decidable using 
the result of Section 5. Finally, the use of two-variable logic with counting is 
interesting because it shows how to express some additional constraints that 
go beyond the language in Figure 4. 


5 A Classical Prefix-Vocabulary Class 


In this section we outline our first technique for checking satisfiability of con- 
junctions of unnested literals. This technique is based on the class of universal 
formulas in first-order logic with a relational signature without function sym- 
bols of non-zero arity. We translate conjunctions of literals into equisatisfiable 
formulas in this class while introducing a constant number of universal quan- 
tifiers. 


The class [3*V*|_. Define the class [3*V%|_ as the set of all formulas of the 
form 371,...,2p.VY1,-.-,Yq-F where p > 0 and F is quantifier-free formula 
of first-order logic with equality without function symbols. Let [A*V*|— be the 
set of formulas L),,[3*V“|=. We then have the following two results [4, Page 
258]. 


Fact 5.1 For any fired q, satisfiability for [A*V%|~ is in NP. The satisfiability 
for [A°V*|= is in NEXPTIME. 


The decision procedure for [3*V*]— can be based on the small model property 
and amounts to generating models with at most one element for each exis- 
tentially quantified variable of a formula and evaluating the formula on those 
models. 


— 


The idea of the translation. The translation of the language in Figure 4 
into [S*V*]_ class can be summarized as follows: 1) use unary relations to 
represent sets, 2) use binary relations to represent object-valued and set-valued 
fields, and 3) use universal quantifiers to represent set operations. To make 
this approach work, we need to properly represent null references, eliminate 
array updates by case analysis, and carefully translate cardinality constraints 
to avoid introducing an unbounded number of quantifiers. 


Axioms for fields. We represent both object-valued and set-valued fields 
using binary relations. 'To ensure that object-valued fields are not assigned 
multiple values simultaneously, for each object-valued field fo we introduce a 
conjunct Va, y,z. fo(z,y) A fo(z,z)>y =z. 
Representing null values. We identify two approaches for representing 
variables denoting references that can be potentially null. The first approach 
represents references as sets of cardinality of at most one. In this approach 
the null reference is therefore an empty set. This approach is used in [47] 
as well as in the typestate flag analysis plugin of the Hob system [28]. The 
disadvantage of this approach is that reasoning about sets and relations is 
generally more difficult than reasoning about elements. 

In this paper we examine an alternative approach that retains the dis- 
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Vi=V? Via,y. Ve(@,y) <> VF (2,y) 
Vp =VelVo VolN tay. Vilty) <> ((@ =Vo Ay = Ve) v 
( Vo AV; (x, y))) 
V;} — V7 [Vo r+ Vo]| Via, y. Vi (a, y) = ((©& =VoAVs(y)) V 
(x 4 Vo A Vs(x, y))) 


Fig. 8. Rules for transforming positive literals into [A*V*|_ fragment 


tinction between sets and elements, and uses the constant null to represent 
null references so that each field is a total function that may potentially have 
null value. To make the decision procedure for [3*V*|_ applicable to such en- 
coding, we need to overcome the difficulty that is fundamental to the small 
model property of the [S*V*|= class: although it is possible to write axioms 
that constrain binary predicates to have at most one value for each argument, 
it is not possible to constrain them to have exactly one value. As a result, 
the satisfiability will consider models where some object-valued fields are not 
total. We next make sure that such models do not pose a problem: we trans- 
form the formula so that the following holds: if there is a model where some 
object-valued fields are partial, then there is a completed model where these 
fields have the value null. A completed model replaces the interpretation [f] 
with the interpretation [f] U {(a, null) | ~dy.(z, y) € [f]}. Here f denotes an 
object-valued field, and [f] denotes the interpretation of relation symbol f. 
We first make sure to use only quantifiers that range over non-null objects. 
We write Vtx.F as a shorthand for Vz.2 4 null=> F’. As a result, when x and 
y are universally quantified, the truth value of an atomic formula f(z, y) is 
not affected by completion. To ensure that a similar claim holds when one 
of x,y is an existentially quantified variable, we ensure that for each vari- 
able appearing in a binary relation symbol, the conjunction of literals always 
contains either x ¥ null or x = null. It then suffices to consider the case of 
literals f(x,y) and af(z,y) such that y = null occurs as one of the literals. 
Note that, if f(x,y) holds in the original model, then we know that com- 
pleted model will still satisfy f(x,y). Therefore, the only problem is the case 
of literals a f(x,y) A y = null. We therefore transform such conjunction into 
f(x,z) Az 4 null for a fresh (existentially quantified) variable z. The result 
of the transformation is equivalent on models where f is total. We also intro- 
duce the universal axiom Vz.7f(null, x) to simplify the structure of possible 
models. 
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Translating positive literals. Modulo the treatment of null references 
discussed above, the expected semantics of operations in our language yields 
translation rules. Figure 8 shows the translation of positive atomic formulas. 


Translating negative literals. To translate a negative literal, negate 
the translation of the underlying atomic formula as in Figure 8 by replacing 
universal quantifiers with existential quantifiers. Because we are looking at 
satisfiability, we drop existential quantifiers while making sure that the newly 
introduced variables are fresh. 


Translating cardinality constraints. We translate positive cardinality 
constraint card(Vs) < k by introducing k fresh constants a,,...,@,, replacing 
the constraint with Vs = {a1,...,a,}, and then translating the result as 
in Figure 8. We translate the negative cardinality constraint 7(card(Vs) < 
k), which is equivalent to card(Vs) > k + 1, by introducing fresh constants 
@1,---,Qx%, 41, and replacing the constraint with 


K Vs) A f(a Fay. 


1<i<k+1 1<i<j<k+1 


Complexity. We next show that satisfiability of formulas in Figure 4 is NP 
complete. We have carefully constructed our translation so that it introduces 
a bounded number of quantifiers. Indeed, each conjunct introduces at most 
three universal quantifiers. By moving these quantifiers to prenex position 
using the transformation 


(Va; y;2.F4 (2, Y,2)) A (Va, y, 2.Fo(a, y, z)) pec 
Va, Y, 2( F(z, Y, z) \ Fo(Z,¥, Z)) 


we can write the formula in prenex form [3*V*?|_. Because the size of the 
generated formula is polynomial in the size of the original formula and the 
time to generate it is polynomial, by Fact 5.1 we conclude that checking 
the satisfiability of one assignment to unnested atomic formulas is in NP. 
Unnested form is polynomial in the size of conjunction of literals that speci- 
fies an assignment to atomic formulas of a formula F in Figure 4, and picking 
an assignment to atomic formulas can be done in NP. By composing these 
two non-deterministic choices, we obtain an NP decision procedure for sat- 
isfiability of expressions. NP-hardness follows trivially because our language 
subsumes propositional logic. We conclude that the satisfiability of formulas 
in Figure 4 is NP-complete. 


Remarks on related work. Fragments of first-order logics based on quan- 
tifier prefixes are systematized in [4,17] where the [3*V*]_ class is described 
as Bernays-Schonfinkel-Ramsey class. Finite model finding tools such as Al- 
loy [22], MACE [35] and Paradox [7] can therefore be used to check satisfia- 
bility of such formulas. Resolution techniques [46] are also complete for this 
class because the term model is finite. 

Symbolic formulations of shape analysis such as [56, 55, 26,53] can be 
adapted to work with our specification language, so our decision procedure 
can be used a component of a shape analysis. The idea of exploiting the pa- 
rameterized complexity of [S*V?|_ seems more generally useful. For example, 
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we can use it to obtain the fact that Boolean shape analysis constraints [26] 
are NP-complete: it suffices to non-deterministically pick a satisfiable quan- 
tified formula, and then observe that each of the quantified conjuncts is in 
4*V?|~. [26] was influenced by [56], which points out the importance of [3*V*]— 
fragment itself [56, Section 3.4, Page 20]. [21] studies the extensions of the 
+V*|_ fragment with transitive closure and shows several decidability and 
undecidability results that delineate the boundary between decidable and un- 
decidable extensions of [3*V*|_. Decidable extensions of [S*V*|_ fragment are 
useful for shape analysis of recursive structures. Nevertheless, by encapsu- 
lating recursive data structures and specifying them using sets, even logics 
without transitive closure can be useful in establishing high-level properties 
of programs [25, 30,27], following the idea that different levels of abstraction 
require different reasoning techniques [29, 19]. 


— 


— 


6 Two-Variable Logics 


In this section we show that two-variable logic with counting, denoted C?, 
can be used to decide constraints in Figure 4, as well as some useful exten- 
sions of these constraints. We consider the satisfiability problem and use a 
language containing any number of constants, unary relation symbols, and 
binary relation symbols. 


Two-variable logics. The logic C? is a first-order logic 1) extended with 
counting quantifiers J-*z.F (2x), saying that there are at least k elements x 
satisfying formula F(x) for some constant k, and 2) restricted to allow only 
two variable names x, y in formulas. Note that the variables x and y may be 
reused via quantifier nesting, and that formulas of the form J-*z. F(x) and 
4<*x. F(x) are expressible as boolean combination of formulas of the form 
42*r. F(x). The logic C? was shown decidable in [18] and the complexity 
for the C? fragment of C? (with counting up to one) was established in [43]. 
Two-variable logic without counting L? was known to be decidable previously 
due to a finite model property [38], in fact there is a doubly exponential bound 
on the size of the finite model. On the other hand, two-variable logic with 
counting does not have a finite model property [18]. The usefulness of two- 
variable logic with counting for reasoning about relations between objects was 
identified in [25,27] and its use for encoding description logics can be found 
in e.g. [5,1]. 


Encoding into two-variable logic with counting. We next explain 
how to encode the constraints in Figure 4 into two-variable logic with count- 
ing. It turns out that most of the ideas of the encoding in Section 5 ap- 
ply to encoding using two-variable logic as well, because we only use at 
most two universal quantifiers in Figure 8, and the existentially quantified 
variables simply become constants in the language. ‘To avoid using three 
variables to express the fact that some relations are functions, we replace 
Vr,y,2- fo(a,y) A fo(2, z) + y = 2 with Vr.a§'y. f(x,y). Finally, we can even 
express the cardinality constraints directly by replacing card(S) < k with 
AS*x.S(2). 
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The additional expressive power of two-variable logic comes from express- 
ing the constructs of the form Vz.dy.f(x,y). Such constructs allow us to 
state non-null properties of objects, which are important for reasoning about 
initialization of objects in object-oriented programming languages [12]. More- 
over, counting quantifiers can naturally express high-level application con- 
straints identified in the database community and object-oriented modelling 
community as referential integrity, cardinality constraints, as well as role con- 
straints [27]. 


7 Nelson-Oppen Combination 


We next note that satisfiability of formulas in Figure 4 can be decided using a 
multi-sorted Nelson-Oppen decision procedure that combines three individual 
decision procedures 1) two-level syllogistic expressed as a component Nelson- 
Oppen procedure as discussed in [57] 2) uninterpreted function symbols [41] 
in multisorted language with function symbols whose result sort can be a set 
sort, and 3) extensional theory of arrays [40,49]. Because an equivalence class 
on shared variables in Nelson-Oppen procedure can be guessed using a non- 
deterministic polynomial algorithm, and each individual decision procedure is 
in NP, we conclude that a Nelson-Oppen combination decision procedure for 
our language is also in NP. 

Note that we can use Nelson-Oppen combination in conjunction with the 
techniques presented in Section 5 and Section 6 because Nelson-Oppen method 
allows quantifier-free combinations of formulas that themselves need not be 
quantifier-free. The approach based on decomposing the language of Figure 4 
into smaller Nelson-Oppen theories has the advantage of using previously un- 
derstood and efficient decision procedures that may be useful in other contexts. 
Moreover, no special encodings are necessary because the use of sorts naturally 
decomposes constraints into the constraints of individual decidable theories. 


8 Conclusions 


We have outlined a range of possible techniques for solving constraints on set- 
valued fields: the use of [S*V*]— class of first-order logic, the use of two-variable 
logic with counting, and the use of Nelson-Oppen combination of decision 
procedures. In addition to these techniques, it may be possible to use general- 
purpose theorem provers on formulas that are of interest to us [10, 51, 52). 
We are currently examining the structure of constraints generated by the Hob 
system [29] to evaluate the effectiveness of different decision procedures. 


Acknowledgements. We thank Darko Marinov for useful comments on an 
earlier version of this paper and useful discussions about the use of sets in 
symbolic execution. 
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